917 resultados para Protein Structure, Multifractal Analysis, 6 Letter Model


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Genomic and proteomic analyses have attracted a great deal of interests in biological research in recent years. Many methods have been applied to discover useful information contained in the enormous databases of genomic sequences and amino acid sequences. The results of these investigations inspire further research in biological fields in return. These biological sequences, which may be considered as multiscale sequences, have some specific features which need further efforts to characterise using more refined methods. This project aims to study some of these biological challenges with multiscale analysis methods and stochastic modelling approach. The first part of the thesis aims to cluster some unknown proteins, and classify their families as well as their structural classes. A development in proteomic analysis is concerned with the determination of protein functions. The first step in this development is to classify proteins and predict their families. This motives us to study some unknown proteins from specific families, and to cluster them into families and structural classes. We select a large number of proteins from the same families or superfamilies, and link them to simulate some unknown large proteins from these families. We use multifractal analysis and the wavelet method to capture the characteristics of these linked proteins. The simulation results show that the method is valid for the classification of large proteins. The second part of the thesis aims to explore the relationship of proteins based on a layered comparison with their components. Many methods are based on homology of proteins because the resemblance at the protein sequence level normally indicates the similarity of functions and structures. However, some proteins may have similar functions with low sequential identity. We consider protein sequences at detail level to investigate the problem of comparison of proteins. The comparison is based on the empirical mode decomposition (EMD), and protein sequences are detected with the intrinsic mode functions. A measure of similarity is introduced with a new cross-correlation formula. The similarity results show that the EMD is useful for detection of functional relationships of proteins. The third part of the thesis aims to investigate the transcriptional regulatory network of yeast cell cycle via stochastic differential equations. As the investigation of genome-wide gene expressions has become a focus in genomic analysis, researchers have tried to understand the mechanisms of the yeast genome for many years. How cells control gene expressions still needs further investigation. We use a stochastic differential equation to model the expression profile of a target gene. We modify the model with a Gaussian membership function. For each target gene, a transcriptional rate is obtained, and the estimated transcriptional rate is also calculated with the information from five possible transcriptional regulators. Some regulators of these target genes are verified with the related references. With these results, we construct a transcriptional regulatory network for the genes from the yeast Saccharomyces cerevisiae. The construction of transcriptional regulatory network is useful for detecting more mechanisms of the yeast cell cycle.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The vast majority of known proteins have not yet been experimentally characterized and little is known about their function. The design and implementation of computational tools can provide insight into the function of proteins based on their sequence, their structure, their evolutionary history and their association with other proteins. Knowledge of the three-dimensional (3D) structure of a protein can lead to a deep understanding of its mode of action and interaction, but currently the structures of <1% of sequences have been experimentally solved. For this reason, it became urgent to develop new methods that are able to computationally extract relevant information from protein sequence and structure. The starting point of my work has been the study of the properties of contacts between protein residues, since they constrain protein folding and characterize different protein structures. Prediction of residue contacts in proteins is an interesting problem whose solution may be useful in protein folding recognition and de novo design. The prediction of these contacts requires the study of the protein inter-residue distances related to the specific type of amino acid pair that are encoded in the so-called contact map. An interesting new way of analyzing those structures came out when network studies were introduced, with pivotal papers demonstrating that protein contact networks also exhibit small-world behavior. In order to highlight constraints for the prediction of protein contact maps and for applications in the field of protein structure prediction and/or reconstruction from experimentally determined contact maps, I studied to which extent the characteristic path length and clustering coefficient of the protein contacts network are values that reveal characteristic features of protein contact maps. Provided that residue contacts are known for a protein sequence, the major features of its 3D structure could be deduced by combining this knowledge with correctly predicted motifs of secondary structure. In the second part of my work I focused on a particular protein structural motif, the coiled-coil, known to mediate a variety of fundamental biological interactions. Coiled-coils are found in a variety of structural forms and in a wide range of proteins including, for example, small units such as leucine zippers that drive the dimerization of many transcription factors or more complex structures such as the family of viral proteins responsible for virus-host membrane fusion. The coiled-coil structural motif is estimated to account for 5-10% of the protein sequences in the various genomes. Given their biological importance, in my work I introduced a Hidden Markov Model (HMM) that exploits the evolutionary information derived from multiple sequence alignments, to predict coiled-coil regions and to discriminate coiled-coil sequences. The results indicate that the new HMM outperforms all the existing programs and can be adopted for the coiled-coil prediction and for large-scale genome annotation. Genome annotation is a key issue in modern computational biology, being the starting point towards the understanding of the complex processes involved in biological networks. The rapid growth in the number of protein sequences and structures available poses new fundamental problems that still deserve an interpretation. Nevertheless, these data are at the basis of the design of new strategies for tackling problems such as the prediction of protein structure and function. Experimental determination of the functions of all these proteins would be a hugely time-consuming and costly task and, in most instances, has not been carried out. As an example, currently, approximately only 20% of annotated proteins in the Homo sapiens genome have been experimentally characterized. A commonly adopted procedure for annotating protein sequences relies on the "inheritance through homology" based on the notion that similar sequences share similar functions and structures. This procedure consists in the assignment of sequences to a specific group of functionally related sequences which had been grouped through clustering techniques. The clustering procedure is based on suitable similarity rules, since predicting protein structure and function from sequence largely depends on the value of sequence identity. However, additional levels of complexity are due to multi-domain proteins, to proteins that share common domains but that do not necessarily share the same function, to the finding that different combinations of shared domains can lead to different biological roles. In the last part of this study I developed and validate a system that contributes to sequence annotation by taking advantage of a validated transfer through inheritance procedure of the molecular functions and of the structural templates. After a cross-genome comparison with the BLAST program, clusters were built on the basis of two stringent constraints on sequence identity and coverage of the alignment. The adopted measure explicity answers to the problem of multi-domain proteins annotation and allows a fine grain division of the whole set of proteomes used, that ensures cluster homogeneity in terms of sequence length. A high level of coverage of structure templates on the length of protein sequences within clusters ensures that multi-domain proteins when present can be templates for sequences of similar length. This annotation procedure includes the possibility of reliably transferring statistically validated functions and structures to sequences considering information available in the present data bases of molecular functions and structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Site-directed mutagenesis and combinatorial libraries are powerful tools for providing information about the relationship between protein sequence and structure. Here we report two extensions that expand the utility of combinatorial mutagenesis for the quantitative assessment of hypotheses about the determinants of protein structure. First, we show that resin-splitting technology, which allows the construction of arbitrarily complex libraries of degenerate oligonucleotides, can be used to construct more complex protein libraries for hypothesis testing than can be constructed from oligonucleotides limited to degenerate codons. Second, using eglin c as a model protein, we show that regression analysis of activity scores from library data can be used to assess the relative contributions to the specific activity of the amino acids that were varied in the library. The regression parameters derived from the analysis of a 455-member sample from a library wherein four solvent-exposed sites in an α-helix can contain any of nine different amino acids are highly correlated (P < 0.0001, R2 = 0.97) to the relative helix propensities for those amino acids, as estimated by a variety of biophysical and computational techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fluid–Structure Interaction (FSI) problem is significant in science and engineering, which leads to challenges for computational mechanics. The coupled model of Finite Element and Smoothed Particle Hydrodynamics (FE-SPH) is a robust technique for simulation of FSI problems. However, two important steps of neighbor searching and contact searching in the coupled FE-SPH model are extremely time-consuming. Point-In-Box (PIB) searching algorithm has been developed by Swegle to improve the efficiency of searching. However, it has a shortcoming that efficiency of searching can be significantly affected by the distribution of points (nodes in FEM and particles in SPH). In this paper, in order to improve the efficiency of searching, a novel Striped-PIB (S-PIB) searching algorithm is proposed to overcome the shortcoming of PIB algorithm that caused by points distribution, and the two time-consuming steps of neighbor searching and contact searching are integrated into one searching step. The accuracy and efficiency of the newly developed searching algorithm is studied on by efficiency test and FSI problems. It has been found that the newly developed model can significantly improve the computational efficiency and it is believed to be a powerful tool for the FSI analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Communication within and across proteins is crucial for the biological functioning of proteins. Experiments such as mutational studies on proteins provide important information on the amino acids, which are crucial for their function. However, the protein structures are complex and it is unlikely that the entire responsibility of the function rests on only a few amino acids. A large fraction of the protein is expected to participate in its function at some level or other. Thus, it is relevant to consider the protein structures as a completely connected network and then deduce the properties, which are related to the global network features. In this direction, our laboratory has been engaged in representing the protein structure as a network of non-covalent connections and we have investigated a variety of problems in structural biology, such as the identification of functional and folding clusters, determinants of quaternary association and characterization of the network properties of protein structures. We have also addressed a few important issues related to protein dynamics, such as the process of oligomerization in multimers, mechanism on protein folding, and ligand induced communications (allosteric effect). In this review we highlight some of the investigations which we have carried out in the recent past. A review on protein structure graphs was presented earlier, in which the focus was on the graphs and graph spectral properties and their implementation in the study of protein structure graphs/networks (PSN). In this article, we briefly summarize the relevant parts of the methodology and the focus is on the advancement brought out in the understanding of protein structure-function relationships through structure networks. The investigations of structural/biological problems are divided into two parts, in which the first part deals with the analysis of PSNs based on static structures obtained from x-ray crystallography. The second part highlights the changes in the network, associated with biological functions, which are deduced from the network analysis on the structures obtained from molecular dynamics simulations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

tRNA synthetases (aaRS) are enzymes crucial in the translation of genetic code. The enzyme accylates the acceptor stem of tRNA by the congnate amino acid bound at the active site, when the anti-codon is recognized by the anti-codon site of aaRS. In a typical aaRS, the distance between the anti-codon region and the amino accylation site is approximately 70 Å. We have investigated this allosteric phenomenon at molecular level by MD simulations followed by the analysis of protein structure networks (PSN) of non-covalent interactions. Specifically, we have generated conformational ensembles by performing MD simulations on different liganded states of methionyl tRNA synthetase (MetRS) from Escherichia coli and tryptophenyl tRNA synthetase (TrpRS) from Human. The correlated residues during the MD simulations are identified by cross correlation maps. We have identified the amino acids connecting the correlated residues by the shortest path between the two selected members of the PSN. The frequencies of paths have been evaluated from the MD snapshots[1]. The conformational populations in different liganded states of the protein have been beautifully captured in terms of network parameters such as hubs, cliques and communities[2]. These parameters have been associated with the rigidity and plasticity of the protein conformations and can be associated with free energy landscape. A comparison of allosteric communication in MetRS and TrpRS [3] elucidated in this study highlights diverse means adopted by different enzymes to perform a similar function. The computational method described for these two enzymes can be applied to the investigation of allostery in other systems.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Encoding protein 3D structures into 1D string using short structural prototypes or structural alphabets opens a new front for structure comparison and analysis. Using the well-documented 16 motifs of Protein Blocks (PBs) as structural alphabet, we have developed a methodology to compare protein structures that are encoded as sequences of PBs by aligning them using dynamic programming which uses a substitution matrix for PBs. This methodology is implemented in the applications available in Protein Block Expert (PBE) server. PBE addresses common issues in the field of protein structure analysis such as comparison of proteins structures and identification of protein structures in structural databanks that resemble a given structure. PBE-T provides facility to transform any PDB file into sequences of PBs. PBE-ALIGNc performs comparison of two protein structures based on the alignment of their corresponding PB sequences. PBE-ALIGNm is a facility for mining SCOP database for similar structures based on the alignment of PBs. Besides, PBE provides an interface to a database (PBE-SAdb) of preprocessed PB sequences from SCOP culled at 95% and of all-against-all pairwise PB alignments at family and superfamily levels. PBE server is freely available at http://bioinformatics.univ-reunion.fr/ PBE/.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

On the basis of the quantitative relationship among rubber processing, structure and property, the methodology of the integrated processing-structure-property analysis on rubber in-mold vulcanization is presented, and then the temporal evolution and spatial distribution characteristics of silicone rubber hot processing parameters, crosslinking structure parameters and mechanical property parameters are obtained by means of the finite element method. The present work is helpful for optimizing curing conditions, and then the design of rubber vulcanization processes according to certain requirements can be done.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We propose a new characterization of protein structure based on the natural tetrahedral geometry of the β carbon and a new geometric measure of structural similarity, called visible volume. In our model, the side-chains are replaced by an ideal tetrahedron, the orientation of which is fixed with respect to the backbone and corresponds to the preferred rotamer directions. Visible volume is a measure of the non-occluded empty space surrounding each residue position after the side-chains have been removed. It is a robust, parameter-free, locally-computed quantity that accounts for many of the spatial constraints that are of relevance to the corresponding position in the native structure. When computing visible volume, we ignore the nature of both the residue observed at each site and the ones surrounding it. We focus instead on the space that, together, these residues could occupy. By doing so, we are able to quantify a new kind of invariance beyond the apparent variations in protein families, namely, the conservation of the physical space available at structurally equivalent positions for side-chain packing. Corresponding positions in native structures are likely to be of interest in protein structure prediction, protein design, and homology modeling. Visible volume is related to the degree of exposure of a residue position and to the actual rotamers in native proteins. In this article, we discuss the properties of this new measure, namely, its robustness with respect to both crystallographic uncertainties and naturally occurring variations in atomic coordinates, and the remarkable fact that it is essentially independent of the choice of the parameters used in calculating it. We also show how visible volume can be used to align protein structures, to identify structurally equivalent positions that are conserved in a family of proteins, and to single out positions in a protein that are likely to be of biological interest. These properties qualify visible volume as a powerful tool in a variety of applications, from the detailed analysis of protein structure to homology modeling, protein structural alignment, and the definition of better scoring functions for threading purposes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Helicobacter pylori is a gastric pathogen which infects ~50% of the global population and can lead to the development of gastritis, gastric and duodenal ulcers and carcinoma. Genome sequencing of H. pylori revealed high levels of genetic variability; this pathogen is known for its adaptability due to mechanisms including phase variation, recombination and horizontal gene transfer. Motility is essential for efficient colonisation by H. pylori. The flagellum is a complex nanomachine which has been studied in detail in E. coli and Salmonella. In H. pylori, key differences have been identified in the regulation of flagellum biogenesis, warranting further investigation. In this study, the genomes of two H. pylori strains (CCUG 17874 and P79) were sequenced and published as draft genome sequences. Comparative studies identified the potential role of restriction modification systems and the comB locus in transformation efficiency differences between these strains. Core genome analysis of 43 H. pylori strains including 17874 and P79 defined a more refined core genome for the species than previously published. Comparative analysis of the genome sequences of strains isolated from individuals suffering from H. pylori related diseases resulted in the identification of “disease-specific” genes. Structure-function analysis of the essential motility protein HP0958 was performed to elucidate its role during flagellum assembly in H. pylori. The previously reported HP0958-FliH interaction could not be substantiated in this study and appears to be a false positive. Site-directed mutagenesis confirmed that the coiled-coil domain of HP0958 is involved in the interaction with RpoN (74-284), while the Zn-finger domain is required for direct interaction with the full length flaA mRNA transcript. Complementation of a non-motile hp0958-null derivative strain of P79 with site-directed mutant alleles of hp0958 resulted in cells producing flagellar-type extrusions from non-polar positions. Thus, HP0958 may have a novel function in spatial localisation of flagella in H. pylori